target function
Convergence theory for Hermite approximations under adaptive coordinate transformations
Recent work has shown that parameterizing and optimizing coordinate transformations using normalizing flows, i.e., invertible neural networks, can significantly accelerate the convergence of spectral approximations. We present the first error estimates for approximating functions using Hermite expansions composed with adaptive coordinate transformations. Our analysis establishes an equivalence principle: approximating a function $f$ in the span of the transformed basis is equivalent to approximating the pullback of $f$ in the span of Hermite functions. This allows us to leverage the classical approximation theory of Hermite expansions to derive error estimates in transformed coordinates in terms of the regularity of the pullback. We present an example demonstrating how a nonlinear coordinate transformation can enhance the convergence of Hermite expansions. Focusing on smooth functions decaying along the real axis, we construct a monotone transport map that aligns the decay of the target function with the Hermite basis. This guarantees spectral convergence rates for the corresponding Hermite expansion. Our analysis provides theoretical insight into the convergence behavior of adaptive Hermite approximations based on normalizing flows, as recently explored in the computational quantum physics literature.
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Illinois (0.04)
Adaptive multi-fidelity optimization with fast learning rates
Fiegel, Come, Gabillon, Victor, Valko, Michal
In multi-fidelity optimization, biased approximations of varying costs of the target function are available. This paper studies the problem of optimizing a locally smooth function with a limited budget, where the learner has to make a tradeoff between the cost and the bias of these approximations. We first prove lower bounds for the simple regret under different assumptions on the fidelities, based on a cost-to-bias function. We then present the Kometo algorithm which achieves, with additional logarithmic factors, the same rates without any knowledge of the function smoothness and fidelity assumptions, and improves previously proven guarantees. We finally empirically show that our algorithm outperforms previous multi-fidelity optimization methods without the knowledge of problem-dependent parameters.
Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate
We study the problem of learning a low-degree spherical polynomial of degree $k_0 = Θ(1) \ge 1$ defined on the unit sphere in $\RR^d$ by training an over-parameterized two-layer neural network with augmented feature in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk $\eps \in (0, Θ(d^{-k_0})]$, an over-parameterized two-layer neural network trained by a novel Gradient Descent with Projection (GDP) requires a sample complexity of $n \asymp Θ( \log(4/δ) \cdot d^{k_0}/\eps)$ with probability $1-δ$ for $δ\in (0,1)$, in contrast with the representative sample complexity $Θ(d^{k_0} \max\set{\eps^{-2},\log d})$. Moreover, such sample complexity is nearly unimprovable since the trained network renders a nearly optimal rate of the nonparametric regression risk of the order $\log({4}/δ) \cdot Θ(d^{k_0}/{n})$ with probability at least $1-δ$. On the other hand, the minimax optimal rate for the regression risk with a kernel of rank $Θ(d^{k_0})$ is $Θ(d^{k_0}/{n})$, so that the rate of the nonparametric regression risk of the network trained by GDP is nearly minimax optimal. In the case that the ground truth degree $k_0$ is unknown, we present a novel and provable adaptive degree selection algorithm which identifies the true degree and achieves the same nearly optimal regression rate. To the best of our knowledge, this is the first time that a nearly optimal risk bound is obtained by training an over-parameterized neural network with a popular activation function (ReLU) and algorithmic guarantee for learning low-degree spherical polynomials. Due to the feature learning capability of GDP, our results are beyond the regular Neural Tangent Kernel (NTK) limit.
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- North America > United States > Washington > King County > Bellevue (0.04)
- (4 more...)
Kriging via variably scaled kernels
Audone, Gianluca, Marchetti, Francesco, Perracchione, Emma, Rossini, Milvia
Classical Gaussian processes and Kriging models are commonly based on stationary kernels, whereby correlations between observations depend exclusively on the relative distance between scattered data. While this assumption ensures analytical tractability, it limits the ability of Gaussian processes to represent heterogeneous correlation structures. In this work, we investigate variably scaled kernels as an effective tool for constructing non-stationary Gaussian processes by explicitly modifying the correlation structure of the data. Through a scaling function, variably scaled kernels alter the correlations between data and enable the modeling of targets exhibiting abrupt changes or discontinuities. We analyse the resulting predictive uncertainty via the variably scaled kernel power function and clarify the relationship between variably scaled kernels-based constructions and classical non-stationary kernels. Numerical experiments demonstrate that variably scaled kernels-based Gaussian processes yield improved reconstruction accuracy and provide uncertainty estimates that reflect the underlying structure of the data
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Oregon (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (5 more...)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Greece (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > Ohio (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Switzerland (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Understanding the Limitations of Deep Models for Molecular property prediction: Insights and Solutions
Molecular Property Prediction (MPP) is a critical task in computational drug discovery, aimed at identifying molecules with desirable pharmacological and ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties. Machine learning models have been widely used in this fast-growing field, with two types of models being commonly employed: traditional non-deep models and deep models.
- North America > United States > California (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Data Science > Data Mining (0.93)
- Europe > Italy (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (4 more...)